Dialogue Act Recognition for Text-based Sinhala

نویسندگان

  • Sudheera Palihakkara
  • Dammina Sahabandu
  • Ahsan Shamsudeen
  • Chamika Bandara
  • Surangika Ranathunga
چکیده

This paper discusses the application of classical machine learning approaches to the task of Dialogue Act Recognition for text-based Sinhala. A study was carried out to identify a dialogue act tag set for Sinhala. A new corpus using Sinhala subtitles for English movies was created and was annotated with the selected dialogue acts. Evaluation of the dialogue act recognition system was performed using features that were used for English language, plus the newly identified features for Sinhala. Although Sinhala is an under-resourced language without even the basic tools such as a PoS tagger, we managed to achieve good classification accuracy by exploiting Sinhala specific features. As far as we are aware, this is the first research on dialogue act recognition on the family of Indo-Iranian languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of ASR N-Best Information on Bayesian Dialogue Act Recognition

A challenge in dialogue act recognition is the mapping from noisy user inputs to dialogue acts. In this paper we describe an approach for re-ranking dialogue act hypotheses based on Bayesian classifiers that incorporate dialogue history and Automatic Speech Recognition (ASR) N-best information. We report results based on the Let’s Go dialogue corpora that show (1) that including ASR N-best info...

متن کامل

NLP Applications of Sinhala: TTS & OCR

This paper brings together the practical applications and the evaluation of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and an Optical Character Recognition system for Sinhala.

متن کامل

Leave One Out Experiments for Statistical Dialogue Act Recognition

We describe a corpus based statistical approach to dialogue act recognition used in a speech translation system. We present the annotated corpus used for training and test purposes, the statistical method, and the results of leave one out experiments. With this memo, we finish our work on statistical dialogue act recognition in Verbmobil.

متن کامل

Speaker Adaptation Applied to Sinhala Speech Recognition

Sinhala, which the main spoken language of the majority of Sri Lanka, is an under-resourced language. Sinhala language is new to the speech recognition research field and faces the problem of not having suitable speech corpora available. For a language like Sinhala, it is essential to find out ways of developing good recognition models using a fewer sample of data. Speaker Adaptive methods prov...

متن کامل

Off-Line Sinhala Handwriting Recognition Using Hidden Markov Models

This paper describes a method to recognize off-line handwritten Sinhala characters, the language used by the majority of Sri Lanka. The classification approach is based on discrete hidden Markov models. A subset of the Sinhala alphabet was chosen for the study. The unknown characters are first pre-classified into one of three character groups, based on the structural properties of the text line...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015